{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Supervised Learning Algorithms: Kernelized Support Vector Machines"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*In this template, only **data input** and **input/target variables** need to be specified (see \"Data Input & Variables\" section for further instructions). None of the other sections needs to be adjusted. As a data input example, .csv file from IBM Box web repository is used.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Libraries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Run to import the required libraries.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib notebook\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Data Input and Variables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Define the data input as well as the input (X) and target (y) variables and run the code. Do not change the data & variable names **['df', 'X', 'y']** as they are used in further sections.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "### Data Input\n",
    "# df = \n",
    "\n",
    "### Defining Variables  \n",
    "# X = \n",
    "# y = \n",
    "\n",
    "### Data Input Example \n",
    "df = pd.read_csv('https://ibm.box.com/shared/static/q6iiqb1pd7wo8r3q28jvgsrprzezjqk3.csv')\n",
    "\n",
    "X = df[['horsepower']]\n",
    "y = df['price']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. The Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Run to build the SVM with both default radial basis function (RBF) and polynomial kernel.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy of RBF-kernel SVC on training set: 0.37\n",
      "Accuracy of RBF-kernel SVC on test set: 0.00\n"
     ]
    }
   ],
   "source": [
    "from sklearn.svm import SVC\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)\n",
    "\n",
    "\n",
    "\n",
    "# The default SVC kernel is radial basis function (RBF)\n",
    "clf = SVC().fit(X_train, y_train)\n",
    "\n",
    "print('Accuracy of RBF-kernel SVC on training set: {:.2f}'\n",
    "     .format(clf.score(X_train, y_train)))\n",
    "print('Accuracy of RBF-kernel SVC on test set: {:.2f}'\n",
    "     .format(clf.score(X_test, y_test)))\n",
    "\n",
    "### THIS MIGHT TAKE A WHILE\n",
    "# # Compare decision boundries with polynomial kernel, degree = 3\n",
    "# clf = SVC(kernel = 'poly', degree = 3).fit(X_train, y_train)\n",
    "\n",
    "# print('Accuracy of poly-kernel SVC on training set: {:.2f}'\n",
    "#      .format(clf.score(X_train, y_train)))\n",
    "# print('Accuracy of poly-kernel SVC on test set: {:.2f}'\n",
    "#      .format(clf.score(X_test, y_test)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1. Support Vector Machine with RBF kernel: gamma parameter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SVM (RBF) with gamma = 1e-05\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.09\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 100\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.37\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n"
     ]
    }
   ],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)\n",
    "\n",
    "for this_gamma in [0.00001, 100]:\n",
    "    clf = SVC(kernel = 'rbf', gamma=this_gamma).fit(X_train, y_train)\n",
    "    print('SVM (RBF) with gamma = {}'.format(this_gamma))\n",
    "    print('Accuracy of SVM (RBF) classifier on training set: {:.2f}'\n",
    "         .format(clf.score(X_train, y_train)))\n",
    "    print('Accuracy of SVM (RBF) classifier on test set: {:.2f}\\n'\n",
    "         .format(clf.score(X_test, y_test)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2. Support Vector Machine with RBF kernel: using both C and gamma parameter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SVM (RBF) with gamma = 0.01 and C = 0.1\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.09\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 0.01 and C = 1\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.15\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 0.01 and C = 15\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.33\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 0.01 and C = 250\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.37\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 1 and C = 0.1\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.11\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 1 and C = 1\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.37\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 1 and C = 15\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.37\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 1 and C = 250\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.37\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 5 and C = 0.1\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.11\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 5 and C = 1\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.37\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 5 and C = 15\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.37\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n",
      "SVM (RBF) with gamma = 5 and C = 250\n",
      "Accuracy of SVM (RBF) classifier on training set: 0.37\n",
      "Accuracy of SVM (RBF) classifier on test set: 0.00\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from sklearn.svm import SVC\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)\n",
    "\n",
    "for this_gamma in [0.01, 1, 5]:\n",
    "    \n",
    "    for this_C in [0.1, 1, 15, 250]:\n",
    "        title = 'gamma = {:.2f}, C = {:.2f}'.format(this_gamma, this_C)\n",
    "        clf = SVC(kernel = 'rbf', gamma = this_gamma, C = this_C).fit(X_train, y_train)\n",
    "        print('SVM (RBF) with gamma = {} and C = {}'.format(this_gamma, this_C))\n",
    "        print('Accuracy of SVM (RBF) classifier on training set: {:.2f}'\n",
    "             .format(clf.score(X_train, y_train)))\n",
    "        print('Accuracy of SVM (RBF) classifier on test set: {:.2f}\\n'\n",
    "             .format(clf.score(X_test, y_test)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3. SVMs with normalized data (feature preprocessing using minmax scaling)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cars dataset (normalized with MinMax scaling)\n",
      "RBF-kernel SVC (with MinMax scaling) training set accuracy: 0.09\n",
      "RBF-kernel SVC (with MinMax scaling) test set accuracy: 0.00\n"
     ]
    }
   ],
   "source": [
    "from sklearn.preprocessing import MinMaxScaler\n",
    "scaler = MinMaxScaler()\n",
    "X_train_scaled = scaler.fit_transform(X_train)\n",
    "X_test_scaled = scaler.transform(X_test)\n",
    "\n",
    "clf = SVC(C=10).fit(X_train_scaled, y_train)\n",
    "print('Cars dataset (normalized with MinMax scaling)')\n",
    "print('RBF-kernel SVC (with MinMax scaling) training set accuracy: {:.2f}'\n",
    "     .format(clf.score(X_train_scaled, y_train)))\n",
    "print('RBF-kernel SVC (with MinMax scaling) test set accuracy: {:.2f}'\n",
    "     .format(clf.score(X_test_scaled, y_test)))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
